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POISSON POINT PROCESS MODELS SOLVE 
THE "PSEUDO-ABSENCE PROBLEM" FOR 
PRESENCE-ONLY DATA IN ECOLOGY 1 

By David I. Warton and Leah C. Shepherd 

University of New South Wales 

Presence-only data, point locations where a species has been 
recorded as being present, are often used in modeling the distribu- 
tion of a species as a function of a set of explanatory variables — 
whether to map species occurrence, to understand its association 
with the environment, or to predict its response to environmental 
change. Currently, ecologists most commonly analyze presence-only 
data by adding randomly chosen "pseudo-absences" to the data such 
that it can be analyzed using logistic regression, an approach which 
has weaknesses in model specification, in interpretation, and in imple- 
mentation. To address these issues, we propose Poisson point process 
modeling of the intensity of presences. We also derive a link between 
the proposed approach and logistic regression — specifically, we show 
that as the number of pseudo-absences increases (in a regular or uni- 
form random arrangement), logistic regression slope parameters and 
their standard errors converge to those of the corresponding Poisson 
point process model. We discuss the practical implications of these 
results. In particular, point process modeling offers a framework for 
choice of the number and location of pseudo-absences, both of which 
are currently chosen by ad hoc and sometimes ineffective methods in 
ecology, a point which we illustrate by example. 

1. Background. Pearce and Boyce (2006) define presence-only data as 
"consisting only of observations of the organism but with no reliable data on 
where the species was not found. Sources for these data include atlases, mu- 
seum and herbarium records, species lists, incidental observation databases 
and radio-tracking studies." Note that such data arise as point locations 
where the organism is observed, which we denote as y in this article. An 
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Fig. 1. (a) Example presence- only data — atlas records of where the tree species An- 
gophora costata has been reported to be present, west of Sydney, Australia. The study 
region is shaded, (b) A map of minimum temperature (°C) over the study region. Vari- 
ables such as this are used to model how intensity of A. costata presence relates to the 
environment, (c) A species distribution model, modeling the association between A. costata 
and a suite of environmental variables. This is the fitted intensity function for A. costata 
records per km 2 , modeled as a quadratic function of four environmental variables using a 
point process model as in Section 4- 



example is given in Figure 1(a). This figure gives all locations where a par- 
ticular tree species (Angophora costata) has been reported by park rangers 
since 1972, within 100 km of the Greater Blue Mountains World Heritage 
Area, near Sydney, Australia. Note that this does not consist of all loca- 
tions where an Angophora costata tree is found — rather it is the locations 
where the species has been reported to be found. We would like to use these 
presence points, together with maps of explanatory variables describing the 
environment (often referred to in ecology as "environmental variables"), to 
predict the location of A. costata and how it varies as a function of explana- 
tory variables (Figure 1). 

Presence-only data are used extensively in ecology to model species distribu- 
tions — while the term "presence-only data" was rarely used before the 1990s, 
ISI Web of Science reports that it was used in 343 publications from 2005 
to 2008. The use of presence-only data in modeling is a relatively recent 
development, presumably aided by the movement toward electronic record 
keeping and recent advances in Geographic Information Systems. One rea- 
son for the current widespread usage of presence-only data is that often this 
is the best available information concerning the distribution of a species, as 
there is often little or no information on species distribution being available 
from systematic surveys [Elith and Leathwick (2007)]. 

Species distribution models, sometimes referred to as habitat models or 
habitat classification models [Zarnetske, Edwards and Moisen (2007)], are 
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regression models for the likelihood that a species is present at a given loca- 
tion, as a function of explanatory variables that are available over the whole 
study region. Such models are used to construct maps predicting the full spa- 
tial distribution of a species [given GIS maps of explanatory variables such 
as in Figure 1(b)]. When surveys have recorded the presence and absence 
of a species in a pre-defined study area ("presence/absence data"), logis- 
tic regression approaches and modern generalizations [Elith, Leathwick and 
Hastie (2008)] are typically used for species distribution modeling. If instead 
presence-only data are to be used in species distribution modeling, then a 
common approach to analysis is to first create "pseudo-absences," denoted 
as yo, usually achieved by randomly choosing point locations in the region of 
interest and treating them as absences. Then the presence/pseudo- absence 
data set is analyzed using standard analysis methods for presence/absence 
data [Pearce and Boyce (2006); Elith and Leathwick (2007)], which have 
been used in species distribution modeling for a long time [Austin (1985)]. 
Ward et al. (2009) recently proposed a modification of the pseudo-absence 
logistic regression approach for the analysis of presence-only data, when the 
probability tt that a randomly chosen pseudo-absence point of a presence is 
known. However, ir is not known in practice. 

We see three key weaknesses of the "pseudo-absence" approach so widely 
used in ecology for analyzing presence-only data, which we describe concisely 
as problems of model specification, interpretation, and implementation. A 
sounder model specification would involve constructing a model for the ob- 
served data y only, rather than requiring us to generate new data yo prior 
to constructing a model. Interpretation of results is difficult, because some 
model parameters of interest (such as pi of Section 3) are a function of the 
number of pseudo-absences and their location. For example, we explain in 
Section 3 that as the number of pseudo-absences approaches infinity, pi — > 0, 
for a given presence-only data set y. Implementation of the approach is prob- 
lematic because it is unclear how pseudo-absences should be chosen [Elith 
and Leathwick (2007); Guisan et al. (2007); Zarnetske, Edwards and Moisen 
(2007); Phillips et al. (2009)], and one can obtain qualitatively different re- 
sults depending on the method of choice of pseudo-absences [Chefaoui and 
Lobo (2008)]. 

In this paper we make two key contributions. First, we propose point 
process models (Section 2) as an appropriate tool for species distribution 
modeling of presence-only data, given that presence-only data arise as a 
set of point events — a set of locations where a species has been reported 
to have been seen. A point process model specification addresses each of 
the three concerns raised above regarding pseudo-absence approaches. Our 
second key contribution is a proof that the pseudo-absence logistic regression 
approach, when applied with an increasing number of regularly spaced or 
randomly chosen pseudo-absences, yields estimates of slope parameters that 
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converge to the point process slope estimates (Section 3). These two key 
results have important ramifications for species distribution modeling in 
ecology (Section 5), in particular, we provide a solution to the problem of 
how to select pseudo-absences. We illustrate our results for the A. costata 
data of Figure 1(a) (Section 4). 

2. Poisson point process models for presence-only data. Presence-only 
data are a set y = {y\, . . . , y n } of point locations in a two-dimensional region 
A, where the locations where presences are recorded (the yi) are out of the 
control of the researcher, as is the total number of presence points n. We also 
observe a "map" of values over the entire region A for each of k explanatory 
variables, and we denote the values of these variables at y\ as (xa, . . . , x^). 

We propose analyzing y = {y\ , . . . , y n } as a point process, hence, we jointly 
model number of presence points n and their location (yi). This has not 
previously been proposed for the analysis of presence-only data, despite 
the extensive literature on the analysis of presence-only data. We consider 
inhomogeneous Poisson point process models [Cressie (1993); Diggle (2003)], 
which make the following two assumptions: 

1. The locations of the n point events (j/i, . . . ,y n ) are independent. 

2. The intensity at point yi [\(yi), denoted as Aj for convenience], the lim- 
iting expected number of presences per unit area [Cressie (1993)], can 
be modeled as a function of the k explanatory variables. We assume a 
log-linear specification: 



although note that the linearity assumption can be relaxed in the usual 
way (e.g., using quadratic terms or splines). The parameters of the model 
for the Aj are stored in the vector /3 = (/3o, /3i, . . . , /3&). 

Note that the process being modeled here is locations where an organism has 
been reported rather than locations where individuals of the organism occur. 
Hence, the independence assumption would only be violated by interactions 
between records of sightings rather than by interactions between individ- 
ual organisms per se. The atlas data of Figure 1 consist of 721 A. costata 
records accumulated over a period of 35 years in a region of 86,000 km 2 , so 
independence of records seems a reasonable assumption in this case, given 
the rarity of event reporting. Nevertheless, the methods we review here can 
be generalized to handle dependence between point events [Baddeley and 
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Cressie (1993) shows that the log-likelihood for y can be written as 

(2.2) i(/3;y) = X)log(A i )- / A(y)dy-log(n!). 

i=x J y^A 

Berman and Turner (1992) showed that if the integral is estimated via nu- 
merical quadrature as j ye ^X(y)dy pa Y^=i w i^ii then the log-likelihood is 
(approximately) proportional to a weighted Poisson likelihood: 

m 

(2.3) ^ PP m(/3;y,yo,w) = ^io i (z i log(A i ) - A,), 

i=l 

where Zj = ^ y o _ |y n+li , . , ,y m } are quadrature points, the vec- 

tor w = (wi, . . . ,w m ) stores all quadrature weights, and /(•) is the indicator 
function. Being able to write l((3;y) as a weighted Poisson likelihood has 
important practical significance because it implies that generalized linear 
modeling (GLM) techniques can be used for estimation and inference about 
/3. Further, adaptations of GLM techniques to other settings, such as gen- 
eralized additive models [Hastie and Tibshirani (1990)], can then be readily 
applied to Poisson point process models also. 

Before implementing this approach, however, we need to make two key 
decisions — how to choose quadrature points yo = {y n +i, ■ ■ ■ ,Um} and how to 
calculate the quadrature weight wi at each point y\. 

We propose choosing quadrature points in a regular rectangular grid, and 
considering grids of increasing spatial resolution until the estimate of the 
maximized log-likelihood l P pm(P',y,yo, w) has converged. A rectangular grid 
provides reasonably efficient coverage of the region A, and is an arrangement 
for which environmental data Xn, . . . ,Xik can be easily obtained via GIS 
software. We illustrate this method in Section 4. Note a large data set may 
be required — in Section 4 convergence was achieved at a spatial scale that 
required inclusion of approximately 86,000 quadrature points. 

Quadrature weights are calculated as the area of the neighborhood A{ 
around each point yi, according to some definition of the Ai such that yi 6 Ai 
for each i, Ai n Ay = for each % ^ i' ', and (Ji^i = A. In Section 4 we 
calculated quadrature weights using the tiling method implemented in the 
R package spatstat [Baddeley and Turner (2005)]. This crude approach 
breaks the region A into rectangular tiles and calculates the weight of a 
point as the inverse of the number of points per unit area in its tile. We 
fixed tile size at the size of the regular grid used to sample quadrature 
points, such that all tiles contained exactly one quadrature point. Dirichlet 
tessellation [Baddeley and Turner (2005)] offers an alternative method of 
estimating weights, but this was not practical for our sample sizes. 
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3. Asymptotic equivalence of pseudo-absence logistic regression and Pois- 
son point process models. Ecologists typically analyze presence-only data 
points y = {y%, . . . ,y n } by generating a set of "pseudo-absence" points yo = 
> • • • j Vm}, then using logistic regression to model the "response vari- 
able" I(i 6 {l,...,n}) as a function of explanatory variables [Pearce and 
Boyce (2006)]. Note that I(i € {1, . . . , n}) is not actually a stochastic quan- 
tity, nevertheless, the use of logistic regression to model this quantity as a 
Bernoulli response variable can be motivated via a case-control argument 
along the lines of Diggle (2003), Section 9.3. 

In this section we will show that the approach to analysis currently used 
in ecology, logistic regression using pseudo-absences, is closely related to 
the Poisson point process model introduced in Section 2. Specifically, if the 
pseudo-absences are either generated on a regular grid or completely at 
random over the region A, then as the number of pseudo-absences increase, 
all parameter estimators except for the intercept in the logistic regression 
model converge to the maximum likelihood estimators of the Poisson process 
model of Section 2. This asymptotic relationship between logistic regression 
and Poisson point process models does not appear to have been recognized 
previously in the literature. 

First, we will specify a probability model for I{i £ {1, . . . , n}) that permits 
a logistic regression model, and the study of its properties as m — > 00. This 
can be achieved by considering a point chosen at random from {y±, . . . ,y m } 
and defining U as the event that the randomly chosen point yi is a presence. 
We are interested in modeling U conditionally on the explanatory variables 
observed at the randomly chosen point, Xj = (xn, . . . ,2^). In this setting, 
U is a Bernoulli variable with conditional mean pi, and we assume that 



where and /o(-) are the densities of Xj conditional on U = 1 and U = 
respectively. Provided that /o(xj|£7 = 0) is not a function of m (which 
is ensured, e.g., by using an identical process to select all pseudo-absence 
points), then the odds of a presence point is a function of m only via 
the multiplier (m — n)~ l . 

It can be seen from equation (3.2) that if m —> 00 in such a way that 
/o(xj|C7 = 0) is not a function of i, then pi — > at an asymptotic rate that is 
proportional to m _1 , and the intercept term in the logistic regression model 



(3.1) 





■ n) because 

/ 1 (x^ = l) n 
/ (x;|l7 = 0) m-n 



I - Pi / (x i |C/ = 0) P(U = 0) 
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approaches — oo at the rate log(m). This in turn means that the logistic 
regression log-likelihood, defined below, will also diverge as m — > oo: 

n m 

(3-3) ^bin(7;y,yo) = ^ lo g(^)+ lo g( 1- ^)- 

i=l i=n+l 

Clearly, as pi — > 0, log(pj) — > — oo and, hence, ^bin(7! Yi yo) — > — oo. Such di- 
vergence is a symptom that the original model has been incorrectly specified. 
The use of the more appropriate spatial point process model of Section 2 will 
not encounter such problems. However, it is shown in the following theorems 
that despite the problems inherent in the logistic regression model specifica- 
tion, and despite divergence of the intercept term, the remaining parameters 
converge to the corresponding parameters from the Poisson process model 
of equation (2.3). Further, pseudo- absences play the same role in the logistic 
regression model that quadrature points played in Section 2. 

For notational convenience, we will define J m to be the single-entry matrix 
whose first element is logm: 

(3.4) J m = (logm,0,...,0). 

This definition will be used in each of the theorems that follow. The J m 
notation is immediately useful in writing out the parameters of the model 
for pi in equation (3.1) as 7 - J m _ n where 7 = (7o> 7l> • • • >7fc)- 

Theorem 3.1. Consider a fixed set of n observations from a point 
process y = {yy,. . .,y n }, and a set of pseudo-absences y = {y n +i, ■ ■ -,11m} 
of variable size that is chosen via some identical process on A for i € 
{n + l,...,m}. We model U, whether or not a randomly chosen point is 
a presence point, via logistic regression as in equation (3.1). 

As m — > oo, the logistic regression log-likelihood of equation (3.3) ap- 
proaches the Poisson point process log-likelihood [equation (2.3)] but with 
all quadrature weights set to one: 

^bin(7;y 5 yo) = Wn(7 - ^ m ;y,yo, i) + 0(m~ 1 ), 

where 1 is a m-vector of ones, and J m is defined in equation (3.4). 

The proofs to Theorem 3.1 and all other theorems are given in Ap- 
pendix A. 

Theorem 3.1 has two interesting practical implications. 

First, it implies that the pseudo- absence points of presence-only logistic 
regression play the same role as quadrature points of a point process model, 
and so established guidelines on how to choose quadrature points (such as 
those of Section 2) can inform choice of pseudo-absences. Previously pseudo- 
absences have been generated according to ad hoc recommendations [Pearce 
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Fig. 2. Asymptotic behavior of Poisson point process and pseudo-absence logistic regres- 
sion models when the number of quadrature points becomes large (via sampling in a regular 
grid with increasing spatial resolution), (a) The maximized log-likelihood converges for a 
Poisson point process, but not for pseudo-absence logistic regression, (b) The parameters 
and their standard errors converge for Poisson point process and logistic regression mod- 
els, for sufficiently high spatial resolution. Linear coefficient of "minimum temperature" 
is given here (corresponding to the second entry in Table 2). 



and Boyce (2006); Zarnetske, Edwards and Moisen (2007)], given the lack 
of a theoretical framework for their selection. In contrast, quadrature points 
are generated in order to estimate the log-likelihood to a pre-determined 
level of accuracy, a criterion which guides the choice of locations and num- 
bers of quadrature points m — n, as explained in Section 2 and as illustrated 
later in Section 4 [Figure 2(a)]. Interestingly, current methods of selecting 
pseudo-absences in ecology [Pearce and Boyce (2006); Zarnetske, Edwards 
and Moisen (2007)] do not appear to be consistent with the best practice in 
low-dimensional numerical quadrature — points are usually selected at ran- 
dom rather than on a regular grid, and the number of pseudo-absences 
(m — n) is more commonly chosen relative to the magnitude of the num- 
ber of presences (n) rather than based on some convergence criterion as in 
Figure 2(a). 

Second, Theorem 3.1 implies that despite the apparent ad hoc nature of 
the pseudo-absence approach, some form of point process model is being es- 
timated. However, logistic regression is only equivalent to a Poisson point 
process when w = 1, that is, all quadrature weights are ignored. The impli- 
cations of ignoring weights is considered in Theorems 3.2 and 3.3 below. 

It should also be noted that Theorem 3.1 is closely related to results due 
to Owen (2007) and Ward (2007), although Theorem 3.1 differs from these 
results by relating pseudo-absence logistic regression specifically to point 
process modeling. 
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Owen (2007) also considered the logistic regression setting where the num- 
ber of presence points is fixed, and the number of pseudo-absences increases 
to infinity, and referred to this as "infinitely unbalanced logistic regression." 
Owen (2007) derived conditions under which convergence of model parame- 
ters could be achieved as the number of pseudo-absences increased. The key 
condition is that the centroid of the points {y%, . . . ,y n } in the design space 
is "surrounded" — see Definition 3 of Owen (2007) for details. 

Along the lines of Ward et al. (2009), Ward (2007) considered a pseudo- 
absence logistic regression formulation of the presence-only data problem, 
and defined the "population logistic model" as the model across "the full 
population" of locations in the region A. The unconstrained log-likelihood 
of the population logistic model [Ward (2007), equation (7.6)] has a similar 
form to the point process log-likelihood Z(/3;y) of Section 2. For presence- 
only logistic regression as in Ward et al. (2009), Ward (2007) shows that as 
the number of pseudo-absences approaches infinity, the log-likelihood con- 
verges to that of the population logistic model, a result that is analogous to 
Theorem 3.1. 

Having shown that pseudo-absence logistic regression is equivalent to a 
point process model where weights are ignored, the implications of ignoring 
weights is now considered in Theorems 3.2 and 3.3. 

Theorem 3.2. Consider a point process model with quadrature points 
yn+i, ■ ■ ■ ,Um selected such that for all i Wi = —, where \A\ is the total area 
of the region A. Assume also that the design matrix X has full rank. 

The maximum likelihood estimators o/Z ppm (7 — J m ;y,yo, 1) and £ ppm (/?; y, 
yo, m 1 )' l~ J m and ft respectively, satisfy 

1 = P + J\A\- 

Further, the Fisher information for 7 and /? is equal. 

That is, provided that quadrature points have been selected such that quadra- 
ture weights are equal, ignoring quadrature weights in a Poisson point process 
model does not change slope parameters nor their standard errors, although 
the intercept term will differ by log(|*4|/m). 

Theorem 3.2 refers to the special case where all points (including presence 
points) are sampled on a regular grid. This arises in the special case where 
the region A has been divided into grid cells of equal area, and each grid 
cell is assigned the value 1 only if it contains a presence point. This form 
of presence-only analysis is sometimes used in ecology [Phillips, Anderson 
and Schapire (2006), e.g.]. In addition, the setting of Theorem 3.2 provides 
a reasonable approximation to the approach to quadrature-point selection 
proposed in Section 2. 
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Note that together Theorems 3.1-3.2 suggest that when quadrature points 
(or, equivalently, pseudo-absences) are sampled in a regular grid at increas- 
ing resolution, the logistic regression parameter estimates and their standard 
errors will approach those of the point process model — with the exception 
of the intercept term, which diverges slowly to — oo as all pi — > at a rate 
inversely proportional to m. This nonconvergence of the intercept was also 
noticed by Owen (2007). Figure 2 illustrates these results for the A. costata 
data. 

Theorem 3.3 below links the above results with the case where pseudo- 
absences are randomly sampled within the region A, which is a more com- 
mon approach in ecology than sampling on a regular grid [e.g., Elith and 
Leathwick (2007); Hernandez et al. (2008)]. 

Theorem 3.3. Consider again the conditions of Theorem 3.2, but now 
assume that the quadrature points yo are selected uniformly at random within 
the region A. As previously, 7 is the maximum likelihood estimator o//p pm (7 — 
J m ;y,yo,l), but now let (3 be the maximum likelihood estimator of l((3;y) 
from equation (2.2). As m — > 00, 

That is, if quadrature points are randomly selected instead of being sampled 
on a regular grid, the result of Theorem 3.2 holds in probability rather than 
exactly. 

Note that the stochastic convergence in Theorem 3.3 is with respect to m 
not n, that is, it is conditional on the observed point process. 

Note also that one can think of randomly selecting pseudo-absence points 
as an implementation of "crude" Monte Carlo integration [Lepage (1978)] 
for estimating J ^X(y)dy in the point process likelihood [equation (2.2)]. 

4. Modeling Angophora costata species distribution. As an illustration, 
we construct Poisson point process models for the intensity of Angophora 
costata records as a function of a set of explanatory variables. We consider 
modeling the log of intensity using linear and quadratic functions of the vari- 
ables minimum and maximum temperature, mean annual rainfall, number 
of fires since 1943, and "wetness," a coefficient which can be considered as an 
indicator of local moisture. These five variables were recommended by local 
experts as likely to be important in determining A. costata distribution. 

Our full model for intensity of A. costata records at the point yi has the 
following form: 

(4.1) log(A i ) = /3 + xf/3 1 +xfBx 4 , 
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Table 1 

AIC values for linear and quadratic Poisson point process models 
for log(A) of Angophora costata presence. Model fitted at the 1 km 
by 1 km resolution 



Model 


AIC 


Linear terms only 


5363.6 


Quadratic (additive terms only) 


4763.4 


Quadratic (interactions included) 


4400.6 



where f3\ is a vector of linear coefficients, B is a matrix of quadratic coeffi- 
cients, and Xj is a vector containing measurements of the five environmental 
variables at point yi. We consider a quadratic model for log(Aj) because 
this enables fitting a nonlinear function and interaction between different 
environmental variables, both considered important in species distribution 
modeling [Elith, Leathwick and Hastie (2008)]. 

All analyses were carried out using purpose-written code on the R program 
[R Development Core Team (2009)]. 

We first considered the spatial resolution at which quadrature points 
needed to be sampled in order for the log-likelihood l(/3;y) to be suitably 
well approximated by Z ppm (/3; y, yo, w). We found [as in Figure 2(a)] that on 
increasing the number of quadrature points, the estimate of the maximized 
log-likelihood converged, and that there was minimal change in the solution 
beyond a resolution of one quadrature point every 1 km (the maximized log- 
likelihood changed by less than one when the number of quadrature points 
was increased 4- fold). Hence, the 1 km resolution was used in model- fitting, 
and these results are reported here. This involved a total of 86,227 quadra- 
ture points. 

In order to study which environmental variables are associated with A. 
costata and how they are associated, we performed model selection where 
we considered different forms of models for log-intensity as a function of 
environmental variables, and we considered different subsets of the envi- 
ronmental variables via all-subsets selection. In both cases we used AIC as 
our model selection criterion, a simple and widely-used penalty-based model 
selection criterion [Burnham and Anderson (1998)]. 

Comparison of AIC values for linear and quadratic models suggest that a 
much better fit is achieved when using a quadratic model with interactions 
terms for all coefficients (Table 1). Hence, we have evidence that environmen- 
tal variables interact in their effect on A. costata. Judging from the model 
coefficients and their relative size compared to standard errors, the interac- 
tions between maximum temperature, minimum temperature, and annual 
rainfall appear to be the major contributors. 
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All-subsets selection considered a total of 32 models, and found that the 
best-fitting model included four variables (Figure 3) — all except for "wet- 
ness." Parameter estimates and standard errors for this best-fitting model 
are given in Table 2(a). 

An image of the fitted intensity surface from the best-fitting model is 
presented in Figure 1 (c) . The regions of highest predicted intensity are near 
the coast and just north of Sydney, which are indeed where the highest den- 
sity of presence points appeared in Figure 1(a). We also compared intensity 
surfaces fitted at different spatial resolutions, and note that they appear 
identical when quadrature points are selected in a 500 x 500 m, 1 x 1 km, or 
2x2 km grid, and that irrespective of spatial resolution, regions of higher in- 
tensity had 0.05-0.2 expected A. costata records per square kilometer, as in 
Figure 1(c). Note this is in contrast to logistic regression, where fitted prob- 
abilities in any given location are a function of number of pseudo-absences, 
and vary by a factor of 16 when moving from a 2 x 2 km to a 500 x 500 m 
grid. 

To assist in interpreting parameters from the best-fitting model, we have 
constructed image plots of the fitted intensity in "environmental space" to 
elucidate the nature of the effect of each environmental variable on intensity 
of A. costata records (Figure 4). It can be seen in Figure 4 that there is 
a strong and negatively correlated response to maximum temperature and 
annual rainfall. Of the four environmental variables, the response to number 
of fires appears to be the weakest, with little apparent change in predicted 
intensity as number of fires increased, and no observable interaction with 
the three climatic variables. 




i 1 1 1 r 

1 2 3 4 5 

Number of explanatory variables 



Fig. 3. Results of all-subsets selection, expressed as AIC of the best-fitting model at each 
level of complexity. The respective best-fitting models included minimum temperature, then 
minimum and maximum temperature, then annual rainfall was added, then fire count, and 
finally wetness. The best-fitting model included four explanatory variables ( all variables 
except wetness). 
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Table 2 

Parameter estimates and their standard errors for (a) the Poisson point process model 
with a 1 x 1 km regular grid of quadrature points; (b) The logistic regression model with 
a 1 X 1 km regular grid of pseudo-absences; (c) The logistic model with 1000 randomly 
chosen pseudo-absence points. In each case we fitted a quadratic model of minimum 
temperature (MNT), maximum temperature (MXT), annual rainfall (RA), and fire count 
(FC). Notice that with few exceptions, terms are equivalent to 2-3 significant figures for 
the models fitted over a regular grid. But this is not the case for (c), and, in particular, 
standard errors are all 30-80% larger 



Term 


(a) 




(b) 




(c) 




$i 




Pi 


se0j) 


03 


«e(&) 


Intercept 


-2130 


169.4 


-2119 


171 


-1999 


227 


MNT 


-16.3 


3.0 


-16.2 


3.0 


-9.91 


4.2 


MNT 2 


-0.21 


0.027 


-0.205 


0.028 


-0.185 


0.050 


MXT 


128.7 


10.1 


128.1 


10.1 


120.2 


13 


MNT * MXT 


0.539 


0.090 


0.535 


0.091 


0.377 


0.13 


MXT 2 


-1.98 


0.15 


-1.97 


0.15 


-1.84 


0.20 


RA 


0.759 


0.065 


0.755 


0.066 


0.714 


0.089 


MNT * RA 


0.00345 


0.00065 


0.00339 


0.00065 


0.00147 


0.00096 


MXT * RA 


-0.0218 


0.0019 


-0.0216 


0.0019 


-0.0203 


0.0025 


RA 2 /1000 


-0.0819 


0.0072 


-0.0815 


0.0072 


-0.0749 


0.010 


FC 


6.24 


3.37 


5.98 


3.42 


4.08 


4.9 


MNT * FC 


-0.101 


0.040 


-0.101 


0.041 


-0.207 


0.070 


MXT * FC 


-0.123 


0.10 


-0.115 


0.010 


-0.0601 


0.15 


RA * FC 


-0.00174 


0.00066 


-0.00171 


0.00067 


-0.000952 


0.00095 


FC 2 


-0.127 


0.024 


-0.123 


0.024 


-0.107 


0.041 



For the purpose of comparison, parameter estimates and their standard 
errors are reported not just for the point-process model fit, but also for the 
analogous logistic regression model in Table 2(b), for a model fitted with 
quadrature points sampled in a 1 x 1 km regular grid. Note that most param- 
eter estimates and standard errors differ by less than 1% between the logistic 
regression and point process models, as expected given Theorems 3.1-3.2. 

We also report results when logistic regression is applied to m — n = 1000 
pseudo-absences at randomly selected locations [Table 2(c)]. This is at the 
lower end of the range typically used [Pearce and Boyce (2006); Elith and 
Leathwick (2007); Hernandez et al. (2008)] for pseudo-absence selection in 
ecology. Note that the standard errors are substantially larger in this case, 
and no parameter estimates are correct past the first significant figure. This 
result exemplifies how current practice in ecology regarding the number of 
pseudo-absences (m — n) can lead to poor results. Instead, it is advisable 
to consider the sensitivity of results to different choices of m — n, along the 
lines of Figure 2. 
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Fig. 4. Image plots of the joint effects of environmental variables on predicted log (in- 
tensity) of A. costata records. Darker areas of the image correspond to higher predicted 
values o/log(Ai). Note the strong and highly correlated response of intensity to maximum 
temperature and annual rainfall, and the relatively weak response to # fires. 



To explore the goodness of fit of the best-fitting model, an inhomogeneous 
if -function [Baddeley, Moller and Waagepetersen (2000)] was plotted using 
the kinhom function from the spatstat package on R [Baddeley and Turner 
(2005)], and simulation envelopes around the fitted model were constructed. 
See Diggle (2003) for details concerning the use of if-functions to explore 
goodness of fit of point process models. The inhomogeneous if -function, as 
its name suggests, is a generalization of the if -function to the nonstationary 
case. It is defined over the region A as 
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Fig. 5. Goodness of fit plot of the quadratic Poisson process model — inhomogeneous 
K -function (solid line) with simulation envelope (broken lines). The envelope gives 95% 
confidence bands as estimated from 500 simulated data sets. Note that the K -function falls 
within those bounds over most of its range, although with a possible departure for r < 6 
km. 



-R'inhom reduces to the usual K function for a stationary process, and like the 
usual /^-function, can be used to diagnose whether there are interactions in 
the point pattern y [Baddeley, Moller and Waagepetersen (2000)]. Results 
(Figure 5) suggest a reasonable fit of this model to the data, although with 
some lack of fit at small spatial scales (r < 6 km) suggestive of possible 
clustering in the data. One possible method of modeling this clustering is 
to fit an area-interaction process [Baddeley and van Lieshout (1995)], using 
the spatstat package. We have repeated analyses using such a model and 
found results to be generally consistent with those presented here, except of 
course that the equivalence with logistic regression no longer holds. 

5. Discussion. In this paper we have proposed the use of Poisson point 
process models for the analysis of presence-only data in ecology, an impor- 
tant and widely-studied problem to which this methodology is well suited. 
We have also shown that this method is approximately equivalent to logistic 
regression, when a suitable number of regularly or randomly spaced pseudo- 
absences are used, hence, we provide a link between the proposed method 
and the approach most commonly used in ecology at the moment. But this 
raises the question: why use point process models, if the method currently 
being used is (asymptotically) equivalent to logistic regression anyway? Sev- 
eral reasons are listed below. 

Recall that in Section 1 we argued that the pseudo-absence approach 
has problems with model specification, interpretation, and implementation. 
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We argue that each of these difficulties is resolved by using a point process 
modeling framework. Model specification — we believe that a point process 
model as in Section 2 is a plausible model for the data generation mech- 
anism for presence-only data. In contrast, the logistic regression approach 
involves generating new data in order to fit a model originally designed for 
a different problem (analysis of binary data not analysis of point-events). 
Hence, the pseudo- absence approach as it is usually applied appears to in- 
volve coercing the data to fit the model rather than choosing a model that 
fits the original data. Interpretation — in the logistic regression approach we 
model pi, the probability that a given point event is a presence not a pseudo- 
absence. This quantity has no physical meaning and clearly its interpretation 
is sensitive to our method of choice of pseudo-absences (and typically each 
Pi — > as m — > oo). In contrast, the intensity at a point \ has a natural 
interpretation as the (limiting) expected number of presences per unit area, 
and will not be sensitive to choice of quadrature points, provided that the 
number of quadrature points is sufficiently large. Implementation — in Sec- 
tion 2 we explain that point process models offer a framework for choosing 
quadrature points. Specifically, equation (2.3) is used to estimate the point 
process log-likelihood, and progressively more quadrature points are added 
until convergence of ^ PP m(/3;y,yo,w) is achieved as in Figure 2(a). No such 
framework for choice of pseudo-absences is offered by logistic regression, and 
instead choice of the location and number of pseudo-absences is ad hoc, with 
potentially poor results [Table 2(c)]. Ecologists are concerned about the is- 
sues of how many pseudo-absences to choose [Pearce and Boyce (2006)], 
where to put them [Elith and Leathwick (2007); Zarnetske, Edwards and 
Moisen (2007); Phillips et al. (2009)], and what spatial resolution to use in 
model-fitting [Guisan et al. (2007); Elith and Leathwick (2009)], all issues 
that have natural solutions given a point process model specification of the 
problem, as in Section 2. 

It should be emphasized that we have demonstrated equivalence of point 
process modeling and pseudo-absence logistic regression only for large num- 
bers of pseudo-absences and only for pseudo-absences that are either regu- 
larly spaced or located uniformly at random over A. Current practice con- 
cerning selection of pseudo-absences in the ecology literature does not always 
involve sampling at random over A [e.g., Hernandez et al. (2008)] and does 
not involve sampling sufficiently many pseudo-absences for model conver- 
gence. Instead, choice of the number of pseudo-absences is ad hoc, and a 
total of 1000-10,000 pseudo-absences is usually used [Elith and Leathwick 
(2007); Hernandez et al. (2008)], although sometimes even fewer [Zarnetske, 
Edwards and Moisen (2007)]. On Figure 2, 1000-10,000 corresponds to a res- 
olution of about 4-8 km, for which model convergence has not been achieved. 
When fitting a model using just 1000 pseudo absences, some parameter es- 
timates are not equivalent to the high-resolution fits to even one significant 
figure, and all standard error estimates were larger by 30-80% (Table 2). 
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While only Poisson point processes were considered in this paper, the 
methodology implemented in Section 4 can be generalized to incorporate in- 
teractions between points in a straightforward fashion [Baddeley and Turner 
(2005)]. However, the links between point process models and logistic regres- 
sion identified in Section 3 may be lost in this more general setting. 

One issue not touched on in this paper is the problem of observer bias — 
that the likelihood of a species being reported is a function of additional 
variables related to properties of the observer and not of the target species, 
such as variation in the level of accessibility of different parts of the region 
A. For example, the high number of A. costata records just north of Sydney 
may be due in part to proximity to a large city, rather than simply being due 
to environmental conditions being suitable for A. costata. This issue will be 
addressed in a related article. 

APPENDIX A: PROOF OF THEOREMS 

A.l. Proof of Theorem 3.1. The proof involves two steps. The first step 
involves showing that ibin(")j as a function of pi, is asymptotically equivalent 
to lpp m (-) when written as a function of Aj. The second step involves showing 
that given the definitions of pi and Aj in equations (2.1) and (3.1), we can 
replace one with the other without affecting the order of approximation. 

Specifically, the log-likelihood function for U can be written as 

n m 

^bm(7;y,yo) = ^ lo §(^) + ^2 1o s( 1_ k) 

i=l i=n+l 

and a Taylor expansion of log(l — pi) yields 

n m 

= ^io g ( K )+ to + °(ft 2 )}> 

i=l i=n+l 

but it can be seen from equation (3.2) that pi = Ofa" 1 ) and, hence, ^27=iPi = 
0{m~ l ) for fixed n, so 

n m 

i=l i=n+l 
n m 

(A.l) =Y,^g(pi) + ^2p i + 0(m- 1 ) 

i=l i=l 
in 

= G {1, . . . , n}) log( Pi ) - Pi } + CKrrr 1 ). 

i=i 

Note that equation (A.l) has the form of the Poisson point process log- 
likelihood, but with all weights set to one and pi being used in place of 
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Aj. We will now derive a relation between pi and Aj which motivates the 
replacement of pi by Aj . 

First note that the Taylor expansion of log(l — x) implies both that log(l — 
Pi) = 0(?n~ 1 ), and that log(m — n) = log(m) + log(l — n/m) = log(m) + 
0(m~ l ). So from equation (3.1), 

k 

log ^ = 70 - log(m - n) + - log(l - pi) 

j'=i 

k 

= 7o - log(m) + ^Xij-fj + C^m" 1 ). 

3=1 

This has the form of equation (2.1), where /3 = 7 — J m . So when (3 = 7 — J m , 
lo gK = log A, + 0(m" 1 ), and £™ iPi = £™ 1 Ml + O^" 1 } = EI^i ^ + 
0{m~ l ). Now plugging these results into equation (A.l) yields / ppm (7 — 
<An; y,y0) 1) + 0(m~ l ), completing the proof. □ 



A.2. Proof of Theorem 3.2. The proof follows by inspection of the score 
equations. Specifically, let Sj(f3; w) = -^-l ppm (f3; y, yo, w). From equation (2.c 

for j e{l,...,k}, 

m / \ \ m 

_ , / 7i — A - \ , 

jWi{zi - Xi) 



( z- — A \ 

(A.2) Sj((3;w) = '^2x ij \ i w i ( 1 - J =^Xy? 

i=i \ » / i=1 

where Zj = J "' ■ If i = 0, equation (A.2) holds but with Xij = 1 for each 
i. 

Now /3 satisfies Sj(/3; ^-1) = for each j, that is, 
(A.3) = f2x l Jl(iel,...,n)-\ i ^\ 

where from equation (2.1), log(Aj) = /3o + Sj=i x ijPi- 
7 satisfies Sj(^ — J m ; 1) = 0, for each j, 

m 

(A.4) = ^x ii (J(iel J ... ) n)-A i ), 

where Aj is the maximum likelihood estimator of Aj for £ ppm (7 — J m ; y , yo , 1 ) , 
which satisfies log(Aj) = 70 — log m + X)jLi x ij7i • 

The solutions to equations (A.3) and (A.4) are related by the identity 
Aj = Aj^- for each i, and if we take the logarithm of both sides, 

k k 

70 - logm + ^ x ijlj = A) + ^2 Xi $i + log I" 4 ! ~~ lo g m - 
i=i i=i 
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Provided that the design matrix X has full rank, 7 = $ + J\ A \ ■ 

Also, note that the (j,j')th element of the Fisher information matrix of 

^ PP m(/3;y,yo,w) is 

(A.5) Ijf (P;v/) = -E\^ g p ^ ^ lppm(P; y, yo, w) j = ^XyWiXy/Ai 

and so Jjy(jS; ^1) = ^I^i x ij x ij'm^ = XXl x ij x ij'\ = Ijj'il ~ J m\ 1) for 
each (j,f). This completes the proof. □ 

A.3. Proof of Theorem 3.3. Let 5 = ^-/3 — J\ A \ . We will prove the the- 
orem by using a Taylor expansion of the score equations for Z ppm (/3; y, yo, 1) 

V 

to show that for fixed n and m — > 00, 5 — )• 0. 

Let S(/3; 1) be the vector of score equations whose jth element is Sj 1) = 
^-^ PP m(/3; Y) Yo> 1) an d l e t be the corresponding Fisher information 

matrix. A Taylor expansion of S(7 — J m ; 1) about S(/3 + J|^|/ m ; 1) yields 

(A.6) S( 7 - J m ; 1) = S(/3 + J\ A \ /m - 1) - I(/3 + J\ A \ /m - 1)5 + O p ( 



The left-hand side is zero, because it is evaluated at the maximizer of 
^ P pm(/?;y,yo, !)• Also, evaluating Aj at (3 + J|^|/ m gives Ai^, and substi- 
tuting this into equation (A. 2) at w = 1, 

\A\ 



n 



S j(P + J\A\/m'i 1) — ^tj ^ ^ x ij^i 

i=l 

Xj 



V 



lyeA 

from the weak law of large numbers. But this is the derivative of l((3;y) 
from equation (2.2), evaluated at the maximum likelihood estimate /3, and 

so it equals zero for each j and, hence, S(/3 + J\ A \/ m ; 1) — > 0. Similarly, for 

each (j, j ), Ijj>(/3 + </|.4|/ m ; 1) — >• Ijj>((3), the (j, j )th element of the Fisher 
information matrix for (3 from /(/3;y). So returning to equation (A.6), 

5 = 10 + J lAl/m ; l)-^ + J lAl/m ; 1) + O p (\\5\\ 2 ) % 0, 
completing the proof. □ 
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